Goto

Collaborating Authors

 basis function


From Kolmogorov to Cauchy: Shallow XNet Surpasses KANs

Neural Information Processing Systems

We study a shallow variant of XNet, a neural architecture whose activation functions are derived from the Cauchy integral formula. While prior work focused on deep variants, we show that even a single-layer XNet exhibits near-exponential approximation rates--exceeding the polynomial bounds of MLPs and spline-based networks such as Kolmogorov-Arnold Networks (KANs). Empirically, XNet reduces approximation error by over 600 on discontinuous functions, achieves up to 20,000 lower residuals in physics-informed PDEs, and improves policy accuracy and sample efficiency in PPO-based reinforcement learning--while maintaining comparable or better computational efficiency than KAN baselines. These results demonstrate that expressive approximation can stem from principled activation design rather than depth alone, offering a compact, theoretically grounded alternative for function approximation, scientific computing, and control.


ADriving-Style-Adaptive Framework for Vehicle Trajectory Prediction

Neural Information Processing Systems

Vehicle trajectory prediction serves as a critical enabler for autonomous navigation and intelligent transportation systems. While existing approaches predominantly focus on pattern extraction and vehicle-environment interaction modeling, they exhibit a fundamental limitation in addressing trajectory heterogeneity originating from human driving styles. This oversight constrains prediction reliability in complex real-world scenarios. To bridge this gap, we propose the Driving-StyleAdaptive (DSA) framework, which establishes the first systematic integration of heterogeneous driving behaviors into trajectory prediction models. Specifically, our framework employs a set of basis functions tailored to each driving style to approximate the trajectory patterns. By dynamically combining and adaptively adjusting the degree of these basis functions, DSA not only enhances prediction accuracy but also provides explanations insights into the prediction process. Extensive experiments on public real-world datasets demonstrate that the DSA framework outperforms state-of-the-art methods.


Generalization Bounds for Kolmogorov-Arnold Networks (KANs)and Enhanced KANs with Lower Lipschitz Complexity

Neural Information Processing Systems

Kolmogorov-Arnold Networks (KANs) have demonstrated remarkable expressive capacity and predictive power in symbolic learning. However, existing generalization errors of KANs primarily focus on approximation errors while neglecting estimation errors, leading to a suboptimal bias-variance trade-off and poor generalization performance. Meanwhile, the unclear generalization mechanism hinders the design of more effective KANs. As the authors of KANs highlighted, they "would like to explore ways to restrict KANs' hypothesis space so that they can achieve good performance." To address these challenges, we explore the generalization mechanism of KANs and design more effective KANs with lower model complexity and better generalization. We define Lipschitz complexity as the first structural measure for deep functions represented by KANs and derive novel generalization bounds based on Lipschitz complexity, establishing a theoretical foundation for understanding their generalization behavior. To reduce Lipschitz complexity and boost the generalization mechanism of KANs, we propose Lipschitz-Enhanced KANs (LipKANs) by integrating the Lip layers and pioneering the L1.5-regularization, contributing to tighter generalization bounds. Empirical experiments validate that the proposed LipKANs enhance the generalization mechanism of KANs when modeling complex distributions. We hope our theoretical insights and proposed LipKANs lay a foundation for the future development of KANs.


Simultaneous Statistical Inference for Off-Policy Evaluation in Reinforcement Learning

Neural Information Processing Systems

This work presents the first theoretically justified simultaneous inference framework for off-policy evaluation (OPE). In contrast to existing methods that focus on point estimates or pointwise confidence intervals (CIs), the new framework quantifies global uncertainty across an infinite or continuous initial state space, offering valid inference over the entire state space.


Understanding Generalization in Physics Informed Models through Affine Variety Dimensions

Neural Information Processing Systems

Physics-informed machine learning is gaining significant traction for enhancing statistical performance and sample efficiency through the integration of physical knowledge. However, current theoretical analyses often presume complete prior knowledge in non-hybrid settings, overlooking the crucial integration of observational data, and are frequently limited to linear systems, unlike the prevalent nonlinear nature of many real-world applications. To address these limitations, we introduce a unified residual form that unifies collocation and variational methods, enabling the incorporation of incomplete and complex physical constraints in hybrid learning settings. Within this formulation, we establish that the generalization performance of physics-informed regression in such hybrid settings is governed by the dimension of the affine variety associated with the physical constraint, rather than by the number of parameters. This enables a unified analysis that is applicable to both linear and nonlinear equations. We also present a method to approximate this dimension and provide experimental validation of our theoretical findings.


Scalable inference of functional neural connectivity at submillisecond timescales

Neural Information Processing Systems

The Poisson Generalized Linear Model (GLM) is a foundational tool for analyzing neural spike train data. However, standard implementations rely on discretizing spike times into binned count data, limiting temporal resolution and scalability. Here, we develop Monte Carlo (MC) methods and polynomial approximations (PA) to the continuous-time analog of these models, and show them to be advantageous over their discrete-time counterparts. Further, we propose using a set of exponentially scaled Laguerre polynomials as an orthogonal temporal basis, which improves filter identification and yields closed-form integral solutions under the polynomial approximation. Applied to both synthetic and real spike-time data from rodent hippocampus, our methods demonstrate superior accuracy and scalability compared to traditional binned GLMs, enabling functional connectivity inference in large-scale neural recordings that are temporally precise on the order of synaptic dynamical timescales and in agreement with known anatomical properties of hippocampal subregions. We provide open-source implementations of both MC and PA estimators, optimized for GPU acceleration, to facilitate adoption in the neuroscience community1.


Adaptive RBF-KAN: A Comparative Evaluation of Dynamic Shape Parameters in Kolmogorov-Arnold Networks

arXiv.org Machine Learning

Kolmogorov-Arnold Networks (KANs) approximate multivariate functions using learnable univariate edge functions, typically parameterized by B-spline bases. Although effective, spline-based implementations can be computationally expensive. A modified version of KANs, called FastKAN, improves efficiency by replacing splines with Gaussian radial basis functions (RBFs), but it relies on a fixed kernel and shape parameter. In this work, we extend the RBF-based KAN framework by introducing a broader family of radial basis kernels and by initializing the kernel shape parameter using leave-one-out cross-validation (LOOCV). To the best of our knowledge, this is the first study that integrates LOOCV-based kernel scale estimation with deep KAN training. We also introduce Matรฉrn and Wendland kernels into the KAN framework for the first time, enabling more flexible basis representations beyond the Gaussian kernel used in FastKAN. The LOOCV estimate provides a data-driven initialization of the kernel scale, which is subsequently refined during network training. The proposed adaptive RBF-KAN is evaluated on several two-dimensional benchmark functions. The results highlight the importance of kernel selection and adaptive shape parameters, with different kernels showing advantages for smooth functions, discontinuities, and oscillatory patterns. Overall, combining LOOCV-based initialization with adaptive kernel learning provides a practical strategy for improving RBF-based KAN models.


A Cubing Strategy for Identifying Stable Hyperparameter Regions for Uncertainty Quantification in Spatial Deep Learning

arXiv.org Machine Learning

Spatially referenced datasets have become increasingly prevalent across many fields, largely driven by advances in data collection methods such as satellite remote sensing. In many applications, predictions at unobserved locations are accompanied by reliable uncertainty estimates. While deep learning methods provide both scalable and accurate models for spatial predictions, there remains no clear consensus for addressing uncertainty quantification in spatial deep learning. Monte Carlo (MC) dropout has become a popular approach for uncertainty quantification, yet existing implementations typically focus on tuning the dropout rate while fixing other influential hyperparameters, such as weight decay and the predictive standard deviation multiplier, often through ad-hoc or manual tuning. We propose a cubing-based diagnostic framework that recursively partitions the hyperparameter space to identify stable regions where MC dropout yields well-calibrated predictive intervals. The approach evaluates hyperparameter regions using scoring rules relative to a statistical baseline model, which serves as a calibration anchor. Through a simulation study spanning multiple spatial dependence regimes as well as a large remotely-sensed land surface temperature dataset, we demonstrate that our approach produces competitive or superior predictive intervals compared to the baseline model. Our methodology provides practitioners with a systematic procedure for incorporating uncertainty quantification into spatial deep learning models.


Improving the Efficiency of Subgroup Analysis in Randomized Controlled Trials with TMLE

arXiv.org Machine Learning

Subgroup analyses within randomized controlled trials are often underpowered due to limited sample sizes. We address this challenge by leveraging trial participants outside the subgroup of interest to augment estimation within the subgroup. Specifically, we study two Targeted Maximum Likelihood Estimators (TMLEs) that borrow information from non-subgroup participants within the same trial: a TMLE with pooled regression (TMLE-PR) and an Adaptive Targeted Maximum Likelihood Estimator (A-TMLE). Both estimators enable information sharing without relying on any external real-world data, thereby capitalizing on key strengths of the trial: most importantly, the protection against bias afforded by the randomized treatment, but also harmonized data collection, and consistent treatment and outcome definitions. The general strategy proposed here directly advances the priorities of key regulatory agencies, including the FDA, by improving the precision of subgroup-specific treatment effect estimates without introducing external sources of bias, thereby facilitating rigorous inference to support equitable labeling, access, and post-market evaluation. In a case study based on analysis of data from a cardiovascular outcome trial (LEADER, NCT01179048), we estimate the risk reduction of major adverse cardiac events (MACE) under liraglutide treatment among Black and Asian subgroups -- each comprising less than 10\% of the trial population -- using the proposed estimators that borrow information from the remainder of the trial. Using A-TMLE, in particular, we find estimated absolute MACE risk reductions of 1.6, 1.5, and 1.5 percentage points among Asian participants and 2.1, 2.0, and 2.1 percentage points among Black participants at 365, 540, and 730 days, respectively, with 95\% confidence intervals excluding the null at each time point.


Covariate Balancing and Riesz Regression Should Be Guided by the Neyman Orthogonal Score in Debiased Machine Learning

arXiv.org Machine Learning

This position paper argues that, in debiased machine learning, balancing functions should be derived from the Neyman orthogonal score, not chosen only as functions of covariates. Covariate balancing is effective when the regression error entering the score can be represented by functions of covariates alone, and it is the natural finite-dimensional approximation for targets such as ATT counterfactual means. For ATE estimation under treatment effect heterogeneity, however, the score error generally contains treatment-specific components because the outcome regression is a function of the full regressor $X=(D,Z)$. In that case, balancing common functions of $Z$ can leave the treatment-specific component unbalanced. We therefore advocate regressor balancing, implemented by Riesz regression with basis functions of $X$, as the general balancing principle for DML. The position is not that covariate balancing is invalid, but that covariate balancing should be understood as the special case that is appropriate when the score-relevant regression error is a function of covariates alone.